#Tasks
Using sample size n = 10 with lower limit 0 and upper limit 5 with runif.
## [1] 0.4518971 1.2418666 0.4659493 0.8642035 2.0549938 2.0686308 0.9329956
## [8] 0.4070480 3.0820802 4.3346579
Getting mean and variance.
## [1] 2.5
## [1] 2.083333
#Using the sample from first part to calculate the sample mean and sample variance
sampleMean = mean(result)
sampleVariance = var(result)
sampleMean## [1] 1.590432
## [1] 1.693615
#Comparing the sample mean and sample variance to the theoretical mean and variance
sampleMean - mean## [1] -0.9095677
## [1] -0.3897188
The sample mean and sample variance are close to the theoretical mean and variance.This makes sense as random samples from a uniform distribution should be close to the theoretical mean and variance.
## [1] 25
## [1] 20.83333
## [1] 2.5
## [1] 0.2083333
myclt=function(n, iter){
y = runif(n * iter, 0, 5) # A
data = matrix(y, nr = n, nc = iter, byrow = TRUE) # B
sm = apply(data, 2, sum) # C
hist(sm)
sm
}
w = myclt(n=10, iter=10000) # DLine A of this function generates a random sample of size n * iter with lower limit 0 and upper limit 5. Line B creates a matrix with n rows and iter columns where each column is a sample of size n. Line C calculates the sum of each column of the matrix. Line D calls the function with n = 10 and iter = 10000.
#Using mean() and var() for sample estimates
sampleMeanW = mean(w)
sampleVarianceW = var(w)
sampleMeanW## [1] 25.00037
## [1] 20.74327
Updated myclt:
myclt=function(n, iter){
y = runif(n * iter, 0, 5) # A
data = matrix(y, nr = n, nc = iter, byrow = TRUE) # B
sm = apply(data, 2, mean) # C - using mean now. Only change needed.
hist(sm)
sm
}
wmean = myclt(n=10, iter=10000)More estimates:
## [1] 2.505727
## [1] 0.2062948
mycltu() function brought over for visibility
### CLT uniform
## my Central Limit Function
## Notice that I have assigned default values which can be changed when the function is called
mycltu=function(n,iter,a=0,b=10){
## r-random sample from the uniform
y=runif(n*iter,a,b)
## Place these numbers into a matrix
## The columns will correspond to the iteration and the rows will equal the sample size n
data=matrix(y,nr=n,nc=iter,byrow=TRUE)
## apply the function mean to the columns (2) of the matrix
## these are placed in a vector w
w=apply(data,2,mean)
## We will make a histogram of the values in w
## How high should we make y axis?
## All the values used to make a histogram are placed in param (nothing is plotted yet)
param=hist(w,plot=FALSE)
## Since the histogram will be a density plot we will find the max density
ymax=max(param$density)
## To be on the safe side we will add 10% more to this
ymax=1.1*ymax
## Now we can make the histogram
hist(w,freq=FALSE, ylim=c(0,ymax), main=paste("Histogram of sample mean",
"\n", "sample size= ",n,sep=""),xlab="Sample mean")
## add a density curve made from the sample distribution
lines(density(w),col="Blue",lwd=3) # add a density plot
## Add a theoretical normal curve
curve(dnorm(x,mean=(a+b)/2,sd=(b-a)/(sqrt(12*n))),add=TRUE,col="Red",lty=2,lwd=3) # add a theoretical curve
## Add the density from which the samples were taken
curve(dunif(x,a,b),add=TRUE,lwd=4)
}
mycltu(n=20,iter=100000)
For w=apply(data, 2, mean), the apply function uses the 2 to specify
that the function should be applied to the columns of the matrix.
Basically like a for loop that goes through each column of the matrix
and applies the function to each column.
The total terms in w is the iter variable as it is the number of columns for the matrix. This is 100000 in this example.
sd uses that specific formula because that formula is the one for theoretical standard deviation of the sampling distribution which is sd = (b - a) / (sqrt(12 * n) and is the same as in the equation given. The formula is a combination of Variance = (b-a)^2 / 12 and standard deviation that equals sqrt(variance/n).
Plots using given parameters:
The theorem doesn’t really work for when n = 1 as shown by the plot.
When n becomes 2 however, it is looking more how it should be as in a
triangular distribution. As n increases, the distribution becomes more
normal as expected. Therefore CLT is working as expected.
mycltb() function brought over for visibility
## CLT Binomial
## CLT will work with discrete or continuous distributions
## my Central Limit Function
## Notice that I have assigned default values which can be changed when the function is called
mycltb=function(n,iter,p=0.5,...){
## r-random sample from the Binomial
y=rbinom(n*iter,size=n,prob=p)
## Place these numbers into a matrix
## The columns will correspond to the iteration and the rows will equal the sample size n
data=matrix(y,nr=n,nc=iter,byrow=TRUE)
## apply the function mean to the columns (2) of the matrix
## these are placed in a vector w
w=apply(data,2,mean)
## We will make a histogram of the values in w
## How high should we make y axis?
## All the values used to make a histogram are placed in param (nothing is plotted yet)
param=hist(w,plot=FALSE)
## Since the histogram will be a density plot we will find the max density
ymax=max(param$density)
## To be on the safe side we will add 10% more to this
ymax=1.1*ymax
## Now we can make the histogram
## freq=FALSE means take a density
hist(w,freq=FALSE, ylim=c(0,ymax),
main=paste("Histogram of sample mean","\n", "sample size= ",n,sep=""),
xlab="Sample mean",...)
## add a density curve made from the sample distribution
#lines(density(w),col="Blue",lwd=3) # add a density plot
## Add a theoretical normal curve
curve(dnorm(x,mean=n*p,sd=sqrt(p*(1-p))),add=TRUE,col="Red",lty=2,lwd=3)
}
mycltb(n=5,iter=10000,p=0.5)Using given parameters:
Using p=0.7 for the previous parameters:
Using p=0.5 for the previous parameters:
The plots are all very similar even with the p values changing. This indicates the sample means are following similar behaviors and is in line with the CLT theorem. Since the number of iterations is high the results, despite the p values changing, would still end up with similar behaviors/distributions. Looking at the shape of them all, they are all holding the same shape so the CLT theorem is working. In this plot below with a smaller number of iterations, the shape of the graph is very different because of this smaller amount of iterations.
mycltp() function brought over for visibility
## CLT Poisson
## CLT will work with discrete or continuous distributions
## my Central Limit Function
## Notice that I have assigned default values which can be changed when the function is called
mycltp=function(n,iter,lambda=10,...){
## r-random sample from the Poisson
y=rpois(n*iter,lambda=lambda)
## Place these numbers into a matrix
## The columns will correspond to the iteration and the rows will equal the sample size n
data=matrix(y,nr=n,nc=iter,byrow=TRUE)
## apply the function mean to the columns (2) of the matrix
## these are placed in a vector w
w=apply(data,2,mean)
## We will make a histogram of the values in w
## How high should we make y axis?
## All the values used to make a histogram are placed in param (nothing is plotted yet)
param=hist(w,plot=FALSE)
## Since the histogram will be a density plot we will find the max density
ymax=max(param$density)
## To be on the safe side we will add 10% more to this
ymax=1.1*ymax
## Make a suitable layout for graphing
layout(matrix(c(1,1,2,3),nr=2,nc=2, byrow=TRUE))
## Now we can make the histogram
hist(w,freq=FALSE, ylim=c(0,ymax), col=rainbow(max(w)),
main=paste("Histogram of sample mean","\n", "sample size= ",n," iter=",iter," lambda=",lambda,sep=""),
xlab="Sample mean",...)
## add a density curve made from the sample distribution
#lines(density(w),col="Blue",lwd=3) # add a density plot
## Add a theoretical normal curve
curve(dnorm(x,mean=lambda,sd=sqrt(lambda/n)),add=TRUE,col="Red",lty=2,lwd=3) # add a theoretical curve
# Now make a new plot
# Since y is discrete we should use a barplot
barplot(table(y)/(n*iter),col=rainbow(max(y)), main="Barplot of sampled y", ylab ="Rel. Freq",xlab="y" )
x=0:max(y)
plot(x,dpois(x,lambda=lambda),type="h",lwd=5,col=rainbow(max(y)),
main="Probability function for Poisson", ylab="Probability",xlab="y")
}
mycltp(n=10,iter=10000)Overall there should be similar results to the binomial distribution. This comes from the equation where lambda = np. Since p is not shown in this example then the equation lambda/n = p can be used. As an example 4/20 = 0.2 and 4/2 = 2. For the higher lambda values, 10/20 = 0.5 and 10/2 = 5. Although the p values differ by a good amount still, the results should be similar to the binomial distribution as binomial is for the number of successes and poisson is the number of events in a fixed interval of time.
Using given parameters:
Now using lambda = 10 for the previous parameters: